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Abstract. The problem of classifying sonar signals from rocks and mines 
first studied by Gorman and Sejnowski has become a benchmark against 
which many learning algorithms have been tested. We discovered that 
both the training set and the test set of this benchmark are linearly 
separable, although with different hyperplanes. Moreover, the complete 
set of learning and test patterns together, is also linearly separable. We 
give the weights that separate these sets, which may be used to compare 
results found by other algorithms. 



1 Introduction 

It has become a current practice to test the performance of learning algorithms 
on realistic benchmark problems. The underlying difHculty of such tests is that 
in general these problems are not well caracterized, making it thus impossible 
to decide whether a better solution that the one already found exists. 

The Sonar signals classification benchmark, introduced by Gorman et al. [6] 
is widely used to test machine learning algorithms. In this problem the classifier 
has to discriminate if a given sonar return was produced by a metal cylinder 
or by a cylindrically shaped rock in the same environment. The benchmark 
contains 208 preprocessed sonar spectra, defined by iV = 60 real values, with 
their corresponding class. Among these, P — 104 patterns are usually used to 
determine the classifier parameters through a procedure called learning. Then, 
the classifier is used to class the G — 104 remaining patterns and the fraction 
of misclassified patterns is used to estimate the generaHzation error produced 
by the learning algorithm. We appHed Monoplane, a neural incremental learning 
algorithm, to this benchmark. In this algorithm, the hidden units are included 
one after the other until the number of training errors vanishes. Each hidden unit 
is a simple binary perceptron, trained with the learning algorithm Minimerror 
[2]. 



2 The Incremental Learning Algorithm 



2.1 Definitions 

Consider a training set of P input-output pairs {^^, t^}, where /J, = 1, 2, • • • , P. 
The inputs = (1, , ^2 ' ' ' ' > Cjv) ^i^ binary or real valued + 1 dimen- 
sional vectors. The first component = 1, the same for all the patterns, ahows 
to process the bias as a supplementary weight. The outputs are binary, = ±1. 
The Neural Network built by the learning algorithm has a single hidden layer of 
H binary neurons connected to the N + 1 input units, and one output neuron 
connected to the hidden units. During training, the number of hidden neurons 
grows until the number of training errors vanishes. After learning, the hidden 
units 1 < h < H have synaptic weights Wh = {who,Whi ■ ■ -Whw), Who being the 
bias of unit h. The output neuron has weights W = (Wq, Wi • • • Wh) where Wq 
is the bias. 

Given an input pattern ^ that the network has to classify, the state ah of 
hidden neuron h is given by: 

ah = sign [ y^Whi^i \ ; h=l,---,H (1) 



(^Whi^i^ ; h=l,---,H 



The network's output C is given by: 



C = sign i J2 WhTh ) (2) 

\h=0 ) 



with ctq = 1 for all the patterns. 



2.2 Monoplane algorithm 

The Monoplane algorithm [1,27] constructs a hidden layer in which each ap- 
pended hidden unit tries to correct the training errors of the previous hidden 
unit. In fact, the construction of hidden layer is similar to the first layer construc- 
tion of the parity machine [10, 11]. But, instead of introducing a second hidden 
layer implementing the parity, our algorithm goes on adding hidden units (if it is 
necessary). In the case of binary inputs it was proven [10] that a solution exists 
with at most P hidden neurons. A solution for real valued inputs also exists 
[5], the upper bound to the number of hidden units being P — \. The proof 
that a solution with a finite number of units also exists, is found in [5]. Thus, 
the algorithm Monoplane converges to a finite size network. Clearly, the upper 
bounds are not tight, and in practice the algorithm constructs very small Neural 
Networks [27]. 

The final number H of hidden units depends on the performance of the 
learning algorithm used to train the individual binary perceptrons. The best 
solution should endow the perceptron with the lowest generalization error if the 
training set is LS, and should minimize the number of errors otherwise. Most 



incremental strategies use the Pocket algorithm [16]. It has no natural stopping 
condition, which is left to the user's patience. None of the proposed alternative 
algorithms as [28] are guaranteed to find the best solution to the problem of 
learning. The success of our incremental algorithm relies on the use of Minimerror 
to train the individual units. Minimerror is based on the minimization of a cost 
function E which depends on the weights w through the stabilities of the patterns 
of the training set. If the input vector is and the corresponding target, 
then the stability "y^ of pattern /i is a continuous and derivable function of the 
weights, given by : 

^f^ = r>'—\ (3) 



where || to ||= ■ w. The stability measures the distance of the pattern to the 
separating hyperplane normal to ; it has a positive sign if the pattern is well 
classified, negative otherwise. The cost function E is given by: 



1 - tanh 

2T 



(4) 



The contribution to E of patterns with large negative stabilities is 1, i.e. they 
are counted as 1 error, whereas the contribution of patterns with large positive 
stabilities is vanishingly small. Patterns within a window of width « 2T centered 
on the hyperplane contribute to the cost function even if they have positive 
stability, proportionally to 1 — 7/r. It may bo shown that E may bo interpreted 
as a noisy measure, at temperature T, of the number of training errors [3]. 
The properties of its global minimum, studied theoretically with methods of 
statistical mechanics [4], have been confirmed by numerical simulations [2, 7]. In 
particular, the minimum of E in the Hmit T ^ corresponds to the weights that 
minimize the number of training errors. If the training set is LS, the weights that 
separate the training set are not unique. It was shown that there is an optimal 
learning temperature such that the minimum of the cost function endows the 
perceptron with a generalization error numerically indistinguishable from the 
optimal (bayesian) value. 

The algorithm Minimerror minimizes the cost E through a gradient descent, 
combined with a slow decrease of the temperature T equivalent to a deterministic 
annealing [2,7], and determines automatically the optimal temperature at which 
it has to stop. 



3 The Sonar Benchmark 

The set of exemples contains 111 patterns obtained by bouncing sonar signals 
off a metal cylinder at various angles, and 97 patterns obtained from rocks under 
similar conditions. Each pattern is a set of 60 numbers in the range [0,1]. Each 
number represents the energy within a particular frequency band, integrated 
over a certain period of time. The label associated with each pattern contains 



the number r = +1 if the signal correspond to a rock and r = —1 if it is a mine 
(metal cyHnder). 



SET 


N 


P 


G 


r = +1 


r = -1 


Train 


60 


104 


104 


55 


49 


Test 


60 


104 


104 


42 


62 


Sonar {Test + Train) 


60 


208 





97 


111 



Table 1. Number of patterns and distribution of classes 



We have numbered each patterns with a label absolute ^. Of this way, the 
set Train has ^ of 1 to 104 and the set Test, ji of 105 to 208. This identification 
allows to analyze each pattern of way infividual. The used procedure was the 
following: learning the P patterns of Train set and measure the error of gener- 
alization on the G patterns of Test set. Later, learn on the P patterns of Test 
set and measure generalization over G patterns of Train set. Finally take the 
Sonar set {Train + Test) and try to learn it. In this last case, of course there is 
not possibility of measuring generahzacion. We have carried out a pre-processing 
of the vector ^ of the learning set, through the following normalization: 

- ^-^^ (5) 
(6) = (6) 

= p (7) 

Some authors [24, 25] have reported that the learning set Train is Hnearly 
separable. But most of people, report results obtained through the backprop- 
agation algorithm (or their variants), and they find too complex nets: with a 
number excessive of parameters [26](weigths and units). 

We found [1] that we needed two hidden units to learn without errors the 
training set Train, but the generalization error was lower with only one unit (a 
simple perceptron) than with two hidden units. This result is usually considered 
as overfitting, a not well defined category used to describe this kind of behaviour. 

A finer tuning of the parameters of Minimerror showed however that this 
benchmark is linearly separable. In fact, the Train set, the Test set, and both 
sets together (i.e. the P + G = 208 in Sonar set) patterns are linearly separable. 
In Tables 2, 3 and 4, we give the values for the weights of the perceptron that 
separate each of the three sets Train, Test and Sonar. 

We have calculated in table 5 the cosine of the angle a between the vector 
Wsonar that Separates the whole set, and the Wxrain, Wrest vectors, that sepa- 
rate Test and Train set respectively. Also the cosine between WTrain and Wrest 
is calculated, following equation (8). 



^Train — { 

-0.0692, -1.5031, -1.9481, -0.2835, -1.0162, -0.2870, -0.5139 -0.3040, 
2.3106, -0.4349, -0.6610, -1.0995, -1.2447, -1.3281, -0.7392, 0.6469, 
1.7862, 1.2227, -0.0513, -0.6431, -0.8745, -0.8290, -0.8084, -0.6578, 
-1.0453, -1.2332, -0.9860, -1.0617, -1.0097, -1.3597, -0.5245, 1.6822, 
0.6588, -0.1056, -0.0794, 0.2998, 1.2290, 0.6709, -0.3025, 0.2681, 
1.2375, 0.2485, 0.1098, 0.1693, -0.5717, -1.2458, -0.7116, -0.1323, 
-1.3481, -2.6467, 1.0464, -0.7163, -0.8324, -0.4364, -1.1849, 1.3439 
0.4299, 1.0813, -0.9662, -0.3129, 0.0015 

} 

Table 2. Weights of Minimerror trained perceptron for Train test 



WTest = { 

-0.4035, -0.9738, 1.0107, 0.9301, -0.8997, -0.5649, 0.9318, 1.5102, 
0.0477, -1.6914, -1.3137, -2.0763, -2.0756, -0.5307, 0.8317, 1.4271, 
0.6112, 0.5119, 0.2081, -0.8285, -1.4488, -1.4337, -1.1908, -1.0213, 
-0.3653, 0.2701, 0.2465, -0.2028, -0.3975, -0.2049, 0.1843, 1.2486, 
0.3270, 0.2806, 0.4427, 0.7089, 1.5015, 1.5818, 0.2483, -0.6511, 
0.6822, 0.4056, -0.4476, -1.4451, -2.1873, -1.5600, -1.0694, -0.6042, 
-0.5170, -0.1298, 1.0330, -1.3454, -1.6560, 0.1098, -0.1249, -0.0331, 
-0.1748, 0.2088, -0.7949, -1.7304, 0.1419 

} 

Table 3. Weights of Minimerror trained perceptron Test set 



Wso 



-0.0290, -0.7499, -0.0626, 0.5991, -0.1493, 0.2057, -0.3432, 0.5235, 
0.2407, -0.2480, -0.2069, 0.4854, -2.0100, 0.7256, 0.0868, -0.6282, 
1.0116, 0.8600, -0.8842, -0.1056, -1.4125, 1.7174, -2.1101, 0.3378, 
-0.8198, -0.1804, 1.4065, -2.2840, 1.4039, -0.1153, -2.6714, 3.3527, 
-1.0352, -1.1619, 1.4134, -0.6482, 0.3479, 0.9895, -0.3477, -0.5707, 
1.2758, -0.6628, 0.6288, -0.7920, -0.0850, -0.1348, -0.7794, 0.2451, 
-0.8392, -0.5660, 1.4128, -0.4471, -0.5439, -0.2079, -0.1840, -0.0060, 
0.2276, -0.0158, -0.2637, -0.1579, 0.1238 

} 



Table 4. Weights of Minimerror trained perceptron that separates the Full Sonar 
{Train + Test) dataset 



(8) 





{Wsonar, Wrrain) 


{Wsonar, WTest) 


(Wrrain, Wrest) 


cos{a) 


0.51615 


0.34238 


0.4 



Table 5. Cosine 



In Table 6 at right, we give the distances of each pattern bad classified by 
Wrrain to the hyperplane separator Wsonar- = 19-2 (15 F+ 5 F-). At left, 
we give the distances of each pattern bad classified by Wxest to the hyperplane 
separator Wsonar- Eg = 23.1 (5 F+ 19 F-) 



Test sot. 


i 




Field 


l{Wso,iar) 




1 


105 


1.19697e-01 


2.09029e-03 


-1 


2 


107 


7.97467C-02 


1.87343C-03 


-1 


3 


108 


1.19431e-01 


1.43453e-02 


-1 


4 


109 


3.59889e-02 


2.09760e-03 


-1 


5 


110 


1.95963C-02 


6.21865e-04 


-1 


6 


111 


6.05680e-02 


3.18180e-04 


-1 


7 


118 


2.02768e-02 


8.74281e-03 


-1 


8 


122 


3.07859C-02 


2.66651C-02 


-1 


9 


131 


6.87472e-02 


7.54780e-03 


-1 


10 


133 


1.37587e-02 


5.23483e-03 


-1 


11 


135 


4.35705e-03 


3.23781e-04 


-1 


12 


136 


7.91603C-03 


1.01329e-02 


-1 


13 


138 


2.35263e-02 


1.13658e-02 


-1 


14 


142 


2.22331e-02 


7.53167e-03 


-1 


15 


143 


2.36318e-02 


6.63512e-03 


-1 


16 


168 


-1.34434e-02 


8.57956e-03 


1 


17 


170 


-8.2G828e-G2 


1.96977e-03 


1 


18 


197 


-4.87395e-G2 


7.08260e-05 


1 


19 


202 


-2.91468e-03 


1.02479e-02 


1 


20 


203 


-8.11795e-02 


4.08223e-02 


1 



Table 6. Bad patterns 



Train set 


i 


/' 


Field 


liW Sonar) 




1 


5 


1.60234O-02 


1.99901O-03 


-1 


2 


6 


2.76646e-02 


1.98919e-03 


-1 


3 


9 


1.63374e-02 


5.77784e-05 


-1 


4 


26 


1.90089C-02 


3.73223C-05 


-1 


5 


39 


5.77398e-02 


2.28692e-04 


-1 


6 


51 


-5.05614e-02 


3.23683e-02 


1 


7 


53 


-1.35299C-01 


2.60559e-03 


1 


8 


55 


-4.13985e-02 


2.24705e-03 


1 


9 


57 


-7.712016-02 


2.30675e-03 


1 


10 


58 


-2.70548C-02 


2.46977C-03 


1 


11 


61 


-3.02880e-02 


2.57994e-03 


1 


12 


62 


-3.168196-02 


1.94132e-02 


1 


13 


64 


-6.23916e-02 


7.15758e-03 


1 


14 


65 


-4.00952e-02 


2.49840e-03 


1 


15 


66 


-2.10826e-01 


2.43180e-03 


1 


16 


72 


-7.44215e-02 


2.31141e-02 


1 


17 


73 


-2.71180e-02 


2.67136e-02 


1 


18 


77 


-1.345316-01 


2.44142e-03 


1 


19 


82 


-2.26242e-01 


1.82462e-03 


1 


20 


83 


-5.91598e-02 


2.07447e-03 


1 


21 


84 


-5.805616-02 


4.04605e-04 


1 


22 


97 


-4.57645C-02 


4.56010C-04 


1 


23 


98 


-6.72313C-02 


1.06360e-04 


1 


24 


100 


-1.834846-02 


3.35972e-03 


1 



Test and Train sets 



4 Discussion and Conclusion 



In this paper, we have shown that the sonar benchmark is linearly separable. 
Both sets, Train and Test are it but for hyperplanes diffcrcnts, it generates 
a certain Cg. We have found solutions to the three sets using the perceptron 
learning rule [29], and we have found that the generalization error is superior 
that the error met with Minimerror. The weight vector for the complete set, 
Sonar, could be used like test for learning algorithms ables to find the best 
separator hyperplane. 
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